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Gene copy-number differences due to gene duplications and deletions are rampant in 
natural populations and play a crucial role in the evolution of genome complexity. Per- 
locus analyses of gene duplication rates in the pre-genomic era revealed that gene 
duplication rates are much higher than the per nucleotide substitution rate. Analyses of 
gene duplication and deletion rates in mutation accumulation lines of model organisms 
have revealed that these high rates of copy-number mutations occur at a genome-wide 
scale. Furthermore, comparisons of the spontaneous duplication and deletion rates to copy- 
number polymorphism data and bioinformatic-based estimates of duplication rates from 
sequenced genomes suggest that the vast majority of gene duplications are detrimental 
and removed by natural selection. The rate at which new gene copies appear in populations 
greatly influences their evolutionary dynamics and standing gene copy-number variation in 
populations. The opportunity for mutations that result in the maintenance of duplicate 
copies, either through neofunctionalization or subfunctionalization, also depends on the 
equilibrium frequency of additional gene copies in the population, and hence on the 
spontaneous gene duplication (and loss) rate. The duplication rate may therefore have 
profound effects on the role of adaptation in the evolution of duplicated genes as well as 
important consequences for the evolutionary potential of organisms. We further discuss 
the broad ramifications of this standing gene copy-number variation on fitness and adaptive 
potential from a population-genetic and genome-wide perspective. 
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INTRODUCTION 

The publication of Ohno's "Evolution by Gene Duplication" is fit- 
tingly viewed as a milestone in the study of gene duplications 
(Ohno, 1970). In addition to collating evidence for duplications 
in evolution, it also presented several hypotheses that have since 
been undergoing robust testing and analyses. For example, Ohno 
perceived that segmental duplications would be associated with 
problems with gene dosage balance and genetic instability, and 
therefore he also placed a great significance on whole-genome 
duplications. Additionally, he viewed the duplicate copy of a gene 
as an initially passive element in the evolution of new genes. A 
duplicated gene was seen as superfluous and therefore not under 
selection after duplication, that is, not until subsequent mutations 
conferred novel beneficial functions. Therefore, Ohno predicted 
that in the majority of instances, a gene duplicate would be lost or 
degenerate into a pseudogene. 

The first characterized segmental gene duplication was the bar 
mutation in Drosophila melanogaster (Sturtevant, 1925). Soon 
after the discovery of the bar mutation, Bridges (1935, 1936) 
suggested that the duplication of genes provided a mechanism 
for increasing chromosome length and providing material for 
subsequent functional changes. This potential borne by gene 
duplication for evolutionary change was further emphasized by 
early geneticists and evolutionary biologists like Haldane, Miiller, 
and Huxley (Haldane, 1933; Miiller, 1935, 1936; Huxley, 1942). 
The bar mutation also serves as an illustration of several general 



features that should be emphasized about duplications. First, 
although it is "simply" a duplication of previously existing material 
that is expected to increase "redundancy" in the genome, the dupli- 
cation has a striking phenotype. Gene duplication theory often 
treats duplications as having no immediate consequences after 
conception under the general assumption that gene duplicates 
must endure a passive existence in the genome until subsequent 
mutational events shape their eventual fate toward nonfunctional- 
ization, subfunctionalization, or neofunctionalization. Thus, the 
immediate phenotypic and fitness consequences of duplications 
have not received the same degree of attention. Second, the fit- 
ness consequences of the bar mutation are most likely deleterious 
(Geer and Green, 1962). Although there is abundant evidence 
of beneficial duplications, particularly in the context of stress- 
ful or perturbed environmental conditions (Maroni etal., 1987; 
Theodore etal., 1991; Brown etal, 1998; Evgen'ev etal, 2004; 
Hemingway etal., 2004; Gonzalez etal., 2005; Deng etal, 2010; 
Nasvall etal., 2012; among others), changes in gene copy-number 
are usually deleterious (Lupski, 1998; Inoue and Lupski, 2002; 
Botstein and Risch, 2003; Bailey and Eichler, 2006; Sebat etal, 
2007). Before the recent advances in detecting copy-number 
changes, an estimated 29% of human genetic diseases were 
thought to result from gene copy-number changes, with 22 and 
7% stemming from gene deletions and duplications, respectively 
(Botstein and Risch, 2003). Lastly, Sturtevant and Morgan (1923) 
discovered that the segmental duplications that gave rise to the bar 
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phenotype were unstable. Although the original experiments on 
the bar mutation do not provide an estimate of the rate of dupli- 
cation, the frequency of reversions due to duplication loss and the 
frequency of double- bar mutation from bar flies was very high, 
on the order of approximately 1CP 3 per generation (Sturtevant, 
1925). These early experiments with the bar mutation therefore 
showed that gene copy-number changes can occur at much higher 
rates than point mutations. 

The study of structural genetic variation is undergoing an 
epochal resurgence. The reasons for this increased interest are 
largely technical. The explosive increase in the number of 
sequenced genomes has made it abundantly clear that the primary 
source of new genes is gene duplication, as previously advanced 
by Ohno (1970). Complementarity, high-throughput screens of 
structural variation in natural populations have demonstrated 
that there is abundant genetic variation in gene copy-number 
variation that we were previously unable to detect on a genome- 
wide scale (Iafrate etal, 2004; Sebat etal, 2004; Maydan etal, 
2007; Emerson etal, 2008; among others). Finally, direct mea- 
surements of mutation rates have shown that structural genetic 
variation arises much more frequently than bioinformatic anal- 
ysis of the age-distribution of extant duplicates in the first 
sequenced genomes had suggested (Lynch etal., 2008; Lipinski 
etal., 2011; Schrider etal., 2013). The high frequencies of spon- 
taneous genome rearrangements and gene copy-number variants 
(CNVs) have important implications for the evolution of novel 
genes, speciation and hereditary disease. Much of the recent 
work in gene duplication has focused on gene copy-number poly- 
morphisms in natural populations, and testing hypotheses of 
functional divergence between paralogs. Here, however, we review 
recent developments on two related topics regarding gene duplica- 
tions, namely the spontaneous rate of segmental gene duplications 
and deletions, and their fitness consequences. 

THE FATE OF DUPLICATED GENES IN POPULATIONS 

Although genomes can provide a rich record of the history of 
gene duplications in a particular lineage, the population-genetic 
dynamics and selection pressures on duplicated genes remain 
poorly understood. The frequency of gene copy-number poly- 
morphisms in populations is determined by a combination of 
the spontaneous duplication/deletion rate and the preservation or 
elimination of these changes by natural selection and/or random 
genetic drift. 

The fixation of a gene duplicate in a population faces multiple 
obstacles. First, there is a high probability that the duplicated gene 
is lost from the population by random genetic drift. Moreover, 
most gene duplications are probably detrimental to organismal 
fitness. They can perturb optimal dosage balance between genes 
contained in the duplicated regions with genes elsewhere in the 
genome, and increased gene dosage can be costly because of super- 
fluous gene expression (Papp etal., 2003; Veitia, 2004). Empirical 
estimates of this cost in Salmonella was found to be substantial 
(3-16%; Reams et al, 2010). In addition to reducing fitness, many 
gene duplications are inherently unstable, particularly if they are in 
tandem orientation or flanked by repeat elements (Anderson and 
Roth, 1981). Lastly, given that most mutations are degenerative, 
a duplicated gene is much more likely to end up as a pseudogene 



than to acquire a function that is distinct from the ancestral gene 
and actively maintained by natural selection. Loss of one copy, 
either due to deletion or mutational inactivation is the fate of 
the overwhelming majority of duplicated genes (Haldane, 1933; 
Lynch and Conery, 2000). How redundant gene copies get to be 
fixed and subsequently maintained in a population has emerged as 
an important issue in the population-genetic theory of evolution 
by gene duplication (Force etal., 1999). 

Several mechanisms have been proposed that would facilitate 
retention of a duplicated gene in a genome, (i) Redundancy could 
be beneficial because it protects the genome from the immedi- 
ate deleterious effects of degenerative mutations (Clark, 1994). 
(ii) Degenerative mutations can lead to loss of different sub- 
functions in the two copies of a gene in such a way that both 
copies would be required to perform what was originally the 
role of a single ancestral locus (DDC, Duplication-Degeneration- 
Complementation; Hughes, 1994; Force etal, 1999). (hi) If there 
is a heterotic interaction (or overdominance) between alleles at a 
locus, the same beneficial interaction between alleles at two loci 
can maintain the duplication through natural selection (Spofford, 
1969). (iv) Natural selection can result in functional divergence 
(neofunctionalization) between alleles prior to gene duplication 
and different alleles can then be preserved at different loci fol- 
lowing duplication (Proulx and Phillips, 2006). (v) Although 
gene duplications create redundant gene copies, many detrimental 
mutations could still be subject to purifying selection if they inter- 
fere with the function of the wild-type copy and this would delay 
the process of turning one of the gene copies into a pseudogene 
(Walsh, 2003). However, selection against these detrimental muta- 
tions would not protect against the deletion of duplicated genes, 
(vi) Increase in gene dosage ("more of the same") can be advanta- 
geous directly and would result in an increase in gene copy- number 
(Ohno, 1970). Selection for greater gene dosage does not have to be 
for the gene's primary activity. When a promiscuous side-function 
of a gene becomes biologically valuable, selection for increase in 
gene dosage would help the spread and maintenance of a dupli- 
cated gene in the population until subsequent beneficial mutations 
result in a novel gene (Roth etal., 1996; Hendrickson etal, 2002; 
Hooper and Berg, 2003; Bergthorsson etal, 2007). There are cer- 
tain similarities between some of these proposed mechanisms of 
selective retention of duplicates. For example, hypotheses (iii), 
(iv), and (vi), depend on natural selection for functions that are 
already present in the population prior to duplication. 

THE IMPORTANCE OF THE GENE DUPLICATION RATE IN EVOLUTION 

The rate at which copy-number variation is introduced and erad- 
icated from populations is crucial to understanding the early 
evolutionary dynamics of novel genes and the evolution of com- 
plexity. Both the standing levels of genetic variation and the genetic 
load are expected to be critically dependent on the rates and fitness 
effects of spontaneous gene duplications and deletions. The res- 
olution of the duplication and deletion rate parameters will also 
serve to elucidate the role of gene copy-number in the evolution 
of disease. 

The duplication rate is a key parameter in determining the 
equilibrium frequency of gene copy-number in populations. For 
neutral duplications, the equilibrium frequency of duplicated 
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genes is expected to be DI{D + L), with D as the spontaneous 
duplication rate and L as the rate of spontaneous loss of duplicate 
gene copies. In the event of deleterious duplications, the equi- 
librium frequency still depends largely on the duplication rate. 
The opportunity for mutations that result in the maintenance of 
duplicate copies, either through neofunctionalization or subfunc- 
tionalization, depend on the equilibrium frequency of additional 
gene copies in the population, and hence on the spontaneous gene 
duplication (and loss) rate. The duplication rate may therefore 
have profound effects on the role of adaptation in the evolution of 
duplicated genes (Ohta, 1988). 

Following the rediscovery of Mendel's laws, some geneticists 
started attributing greater importance to mutations as the driving 
force in evolutionary change, and de-emphasizing the importance 
of natural selection (Morgan, 1916, 1925). The importance of 
mutations and their rate as the greatest determining factor in 
evolution fell out of favor after it was shown that the mutation 
rate is, at best, a very weak force in effecting changes in allele 
frequency (Haldane, 1932, 1933). The neutral theory led to a 
greater appreciation of mutation rates as an evolutionary force, but 
primarily for neutral mutations (Kimura, 1983). More recently, 
theoretical and experimental evidence suggest that differences in 
mutation rates can have an orienting effect on evolutionary change 
(Yampolsky and Stoltzfus, 2001; Rokyta etal, 2005). Mutations 
are, in this view, not simply raw material for evolutionary change, 
but the differences in the rates of supply of different mutations 
influences the outcome with respect to adaptive evolutionary 
change. Given equal mutation rates, the mutations with the high- 
est fitness contributions will, on average, be fixed first (Orr, 2003). 
However, mutations that are less fit can be fixed in the population 
earlier than the fittest mutation if the former are more frequent 
(Yampolsky and Stoltzfus, 2001; Rokyta et al, 2005). Moreover, the 
influence of the mutation rate on the rate of fixation of beneficial 
mutations is greater at smaller effective population sizes (Yampol- 
sky and Stoltzfus, 2001). Let us consider the case of selection for 
increased gene dosage. Both gene duplication and point mutation 
can result in increased gene expression, and many point mutations 
might yield higher expression levels than duplications. However, 
if the gene duplication rate greatly exceeds the per nucleotide sub- 
stitution rate, duplications will have an opportunity to increase 
in frequency, and perhaps reach fixation, before the appearance of 
point mutations in the population with similar or greater effects on 
gene expression. The rate of gene duplication relative to base sub- 
stitutions is therefore particularly relevant for the hypothesis that 
selection for gene dosage is important in the initial preservation 
of duplicated genes. 

ANALYTICAL METHODS USED TO ESTIMATE THE GENE DUPLICATION 
AND DELETION RATE 

Several approaches have been used to estimate the spontaneous 
gene duplication and deletion rates. These estimates have primar- 
ily come from four sources: (i) direct measurements on a single 
locus where gene copy-number differences resulted in a distinct 
phenotype or genotype, (ii) analyses of frequencies of dupli- 
cation polymorphisms in populations, (iii) calculations based 
on the abundance of evolutionarily recent gene duplications in 
sequenced genomes, and (iv) direct genome-wide estimates of 



the duplication/deletion rate from molecular analyses of mutation 
accumulation (MA) lines evolved experimentally under a regime 
of minimal natural selection. 

Direct estimates at specific loci have yielded the highest gene 
duplication frequencies. In contrast, analysis of the age distribu- 
tion of genes in sequenced genomes yields rates that are orders 
of magnitude lower (Lynch and Conery, 2000, 2003; Gu etal, 
2002; Pan and Zhang, 2007). However, the analyses of sequenced 
genomes assume that the birth and death rates of duplicated genes 
are constant over long evolutionary periods. This may be unwar- 
ranted if most gene duplications are detrimental and removed 
from the population by natural selection soon after conception. 

PER-L0CUS RATES 

Per-locus rates of gene duplication have been empirically gen- 
erated for bacteria, flies and humans (Table 1). However, these 
estimates are often based on a very limited number of loci and 
may not be representative for these genomes. 

PR0KARY0TES 

Early experiments with phage and bacteria suggested a fairly high 
duplication rate per gene. For example, experiments with the 



Table 1 I Locus-specific duplication rates for prokaryotes and 
eukaryotes. 



Species Locus-specific duplication rates 

Locus Partial genome 

Prokaryotes 

S. enterica 2.0 x 1CT 3 (ArgH) (a) 3.2 x 10" 3 - 5.8 x 1CT 5 

duplications per locus <b) 

3.0 x 10" 4 (LacZ)< a > 

4.6 x 10- 6 (PyrD)< a > 
Multicellular eukaryotes 

D. melanogaster 1.6 x 10 -5 (Rosy) (c) 

1.7 x 10" 4 (Rosy) (d) 
2.7 x 10- 6 (Maroon- 
like) <d) 

4.0 x 10" 7 (Body-and 
eye-color) (e) 
H. sapiens 1.7 x 10- 5 (PMP22)< f > 

2.6 x 10- 5 (a-globin)<9> 
1.0 x 10- 8 (DMD)< h > 

One rate estimate based on 38 loci is included. All rate measurements are in 

duplications/gene/generation unless otherwise specified. The loci are listed in 

parentheses. 

W Reams etal. (2010) 

Anderson and Roth (1981); across 38 loci in overnight culture 
(c) Gelbart and Chovnick (1979) 
V Shapira and Finnerty (1986) 
^Watanabe etal. (20091 
W Lupski (2007) 
'a) Lam and Jeffreys (2007) 
^Van Ommen (2005) 
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lac operon in Escherichia coli suggested spontaneous duplications 
rates on the order of 1CP 3 to 10~ 4 per gene (Horiuchi et al, 1963; 
Langridge, 1969; Anderson and Roth, 1977). More generally, the 
reported frequency of duplication rates in bacteria and phage for a 
diversity of genes ranged from 10~ 3 to 1CP 5 (Anderson and Roth, 
1977; Starlinger, 1977). The first systematic large-scale study of 
duplication frequency analyzed 38 duplicated loci in stationary 
phase cultures of Salmonella and found frequencies ranging from 
1CT 3 to 10~ 5 per gene (Anderson and Roth, 1981). It should be 
noted that these estimates do not constitute duplication rates per 
generation as they had accumulated during the growth of the 
culture where the duplication rate had been countered by both 
a high rate of spontaneous duplication loss and natural selec- 
tion. A more recent analysis of the duplication rate at three loci 
in the Salmonella genome found rates ranging from 2 x 1CT 3 
to 4.6 x 10~ 6 duplications/gene/generation after carefully con- 
trolling for selection and spontaneous duplication loss (Reams 
etal, 2010). The equilibrium frequency of duplications in culture 
can likewise be quite high, and high-throughput sequencing of 
Salmonella cultures demonstrated that the percentage of cells car- 
rying duplications had reached a steady-state frequency of 20% 
(Sun etal, 2012). 

EUKARYOTES 

Direct estimates of duplication rates at two loci in 
D. melanogaster, the maroon-like and the rosy, were 2.7 x 10~ 6 
and 1.7 x 10~ 4 duplications/locus/generation, respectively (Gel- 
bart and Chovnick, 1979; Shapira and Finnerty, 1986). More 
recently, inverse PCR-based methods were used to measure the 
rates of duplication and deletion of human a-globin genes (Lam 
and Jeffreys, 2006, 2007). The frequencies of spontaneous a -globin 
duplication in sperm were 2.6 x 10~ 5 and 6.2 x 10~ 5 in two 
human males. However, it is possible that the actual duplication 
rate of a -globin genes is in fact higher than reported because the 
PCR primers used to detect the duplications were designed to 
detect specific kinds of duplications, and translocated and inverted 
duplications would not have been detected. Similar methods were 
used to determine the duplication and deletion rates at four loci in 
humans and the duplication rate estimates ranged from 1 . 7 x 1 0~ 5 
to 8.7 x 10~ 7 (Turner etal, 2008). 

Lastly, Watanabe etal. (2009) screened 1,554 progeny of wild- 
caught D. melanogaster females for spontaneous eye- and body- 
color mutations and identified five large deletions ranging from 
40 to 500 kb. If these deletions originated via unequal crossing- 
over, the duplications rate should equal the deletion rate. Based 
on this assumption, the per gene duplication rate was estimated 
to be 4 x 10~ 7 /generation, a similar order of magnitude as other 
empirical per gene duplication rates in Drosophila (Watanabe et al., 
2009). 

These estimates from single loci yield some of the highest 
estimates of the duplication rate. This may stem from both a sam- 
pling bias toward loci with known high duplication rates, and 
because some of the examples come from loci that are experi- 
encing unequal crossing-over between related genes. For example, 
analysis of the duplication rate at the rosy locus was undertaken 
after observing that tandem duplications were occurring at an 
unusually high frequency (Gelbart and Chovnick, 1979). Similarly, 



a-globin gene copy- number polymorphism was well known and 
particularly common in populations with high exposure to malaria 
(Lam and Jeffreys, 2006). The high rate of duplications and dele- 
tions found in these systems may therefore not be representative 
of the genome at large. 

ESTIMATES OF THE DUPLICATION RATE BASED ON 
POPULATION FREQUENCY OF CNVs 

The duplication rate can also be estimated using the frequency of 
gene duplications in a population and population-genetic theory 
of mutation-selection balance. Haldane (1935) showed that for 
X-linked genes in equilibrium, the mutation rate can be estimated 
using 1/3(1 — f)x, where /is the fertility of affected males relative 
to unaffected males and x is the frequency of affected males in the 
population. If the X-linked mutation results in lethality or steril- 
ity, the mutation rate is estimated as x/3. Using this approach, Van 
Ommen (2005) calculated the rate of new gene duplications in 
the X-linked human dystrophin gene leading to Duchenne Mus- 
cular Distrophy (DMD). Males with DMD have, until recently, 
been mostly nonreproductive. The frequency of DMD in male 
newborns is 1:3,500 and the frequency of mutations leading to 
DMD is thus ~10~ 4 (Table 1). Subgenic duplications account for 
9% of these mutations and the rate of duplication was therefore 
estimated to be ~10~ 5 duplications/DMD locus/generation. The 
DMD is very large (2.5 Mb) and extrapolating from this region to 
the whole genome, the genome-wide duplication rate should be 
0.02 duplications/genome/generation. This would be an underes- 
timate if (i) many internal duplications do not result in a DMD 
phenotype, and/or (ii) if duplications that encompass the whole 
locus do not result in a DMD phenotype. 

CMT1A, a subtype of Charcot-Marie-Tooth (CMT) syndrome, 
frequently results from a large duplication that includes the PMP22 
gene. Based on the prevalence of CMT1A and the fraction of CMT 
caused by duplications, the spontaneous duplication rate was esti- 
mated to be between 1.7 and 2.6 x lO -3 duplications/PMP22 
locus/generation (Lupski, 2007). This rate is very similar to the 
rate estimated for DMD and three orders of magnitude higher 
than the spontaneous point mutation rate in humans. 

BIOINFORMATICALLY DERIVED ESTIMATES OF THE 
DUPLICATION RATE FROM WHOLE GENOME SEQUENCES 

Lynch and Conery (2000, 2003) pioneered methods for estimating 
the duplication frequency in sequenced genomes from the age- 
distribution of duplicated genes based on the synonymous site 
divergence between gene paralogs. Their analyses found, for exam- 
ple, that duplications arise at a rate of 0.0011, 0.0028, 0.0025 per 
gene per 1% divergence at synonymous sites in the D. melanogaster, 
Caenorhabditis elegans, and Saccharomyces cerevisiae genomes, 
respectively (Lynch, 2007; Table 2). The spontaneous base sub- 
stitution rate in these species has been measured as 55, 21, and 
3.3 x 10~ 10 mutations/base pair/generation (Haag-Liautard etal., 
2007; Lynch et al, 2008; Denver et al, 2009; Keightley et al, 2009; 
Schrider etal, 2013). If we utilize these rates to convert the his- 
torical gene duplication rate to frequency per gene per generation, 
the duplication rate would be 60.5, 58.8, and 8.25 x 10~ n in 
D. melanogaster, C. elegans and S. cerevisiae, respectively. These 
calculations assume that synonymous site changes are neutral, and 
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Table 2 | Genome-wide estimates of the duplication rates for 
prokaryotes and eukaryotes. 



Species 



Genome-Wide Gene Duplication Rate 



Bioinformatic 



Empirical 



Unicellular eukaryotes 

E. cuniculi 11.7 x 10 
P. falciparum 
S. cerevisiae 



id.bi 



S. pombe 



0.3 x 10" 

2.5 x 10" 
1.0 x 10" 

1.6 x 10- 



" 3 per 1 % silent-site divergence 
3 per 1 % silent-site divergence* 3 ' 11 ' 
3 per 1 % silent-site divergence (a ' b) 3.4 x I0~ 6(d) 
11 /gene/year' c ' 

3 per 1 % silent-site divergence' 3 ' 11 ' 



Multicellular eukaryotes 



A. gambiae 
A. thaliana 

C. elegans 

D. melanog- 
aster 

F. rubripes 
H. sapiens 

M. musculus 



6.2 x 10" 

3.2 x 10- 

2.8 x 10- 
1.1 x 10~ 

4.3 x 10 

4.9 x 10 
1.1 x 10" 
3.0 x 10 



' per 1 % silent-site divergence' 3 ' 11 ' 
' per 1 % silent-site divergence' 3 ' 11 ' 

1 per 1 % silent-site divergence' 3 ' 11 ' 3.4 x 10~ 7(e) 

per 1 % silent-site divergence' 3 ' 11 ' 3.7 x 10~ 7(f ' 

" 3 per 1 % silent-site divergence' 3 ' 11 ' 
" 3 per 1 % silent-site divergence' 3 ' 11 ' 
9 /gene/year'8' 

" 3 per 1 % silent-site divergence' 3 ' 11 ' 



Estimates are further classified into bioinoformatic versus empirical estimates. 
Bioinformatic estimates are based on the distribution of evolutionarily young gene 
duplicates in the genomes of laboratory strains or natural isolates. Empirical 
estimates are derived from mutation accumulation (MA) experiments involving 
experimental lines propagated under strict bottlenecking conditions. All rate mea- 
surements are in duplications/gene/generation unless otherwise specified. The 
loci are listed in parentheses. 
W Lynch and Conery 12003) 
W Lynch (2007) 
< c) Gao andlnnan (2004) 

Lynch etal. (2008) 
^ Lipinski etal. (2011) 
^Schrideretal. (2013) 
'9> Cotton and Page (2005) 

in the event that there is some negative selection on synonymous 
sites, the per generation duplication rates would be overestimated. 
However, it was noted that the duplication rates inferred from 
the age distribution of gene duplicates might be underesti- 
mates for several reasons, (i) The assembly of whole genome 
sequences following shotgun sequencing may erroneously assume 
evolutionarily recent gene duplicates for redundant sequences of 
single-copy genes (Lynch and Conery, 2003). (ii) This particu- 
lar analysis did not include paralogs in gene families possessing 
more than five members. The rates of spontaneous duplica- 
tion and deletion might increase with the size of a gene family 
due to greater abundance of regions of high sequence identity 
that could serve as targets for copy-number changes by unequal 
exchange. 

Gene conversion between duplicate gene copies lowers 
nucleotide sequence divergence between them, making them 
appear evolutionarily younger than they actually are (Teshima and 
Innan, 2004; Katju and Bergthorsson, 2010; Rane etal, 2010). If 
gene conversion between duplicated genes is common, the num- 
ber of recent gene duplications in genomes is overestimated under 



the approach used by Lynch and Conery (2000, 2003). This in 
turn would lead to an inflated gene duplication rate. Using the 
genome of S. cerevisiae and six of it relatives, Gao and Innan 
(2004) calculated the gene duplication rate in yeast by a method 
that does not depend on synonymous site divergence between 
duplicate copies in a genome. They found strong evidence for 
gene conversion between duplicate gene copies, and estimated 
gene duplication rates to be 0.01-0.06 duplications/gene/billion 
years, two orders of magnitude lower than the previous estimate 
of Lynch and Conery (2000). However, S. cerevisiae with its large 
effective population size (N e = ~3.3 x 10 7 ; Lipinski etal., 2011) 
typically characteristic of unicellular eukaryotes is subject to a 
strong intensity of natural selection. Hence, the observed num- 
ber of extant gene duplicates in a sequenced genome may grossly 
underestimate the gene duplication rate as many gene paralogs 
may have been purged from the genome in their infancy leaving 
no signature of their brief existence (Katju etal., 2009; Watanabe 
etal, 2009; Lipinski etal, 2011; Katju, 2012). 

Codon usage bias due to selection for optimal codon use might 
also confound analyses of gene duplication rates with methods 
that rely on DNA sequence divergence at synonymous sites (Gu 
etal., 2002). The rate of molecular evolution in genes that are 
subject to natural selection against synonymous mutations in pre- 
ferred codons is slower than at sites where nucleotide substitutions 
are selectively neutral. Duplicated genes that are experiencing 
selection for codon usage would therefore appear evolutionar- 
ily younger than they are. Gu etal. (2002) therefore suggested 
comparing DNA sequence divergence at synonymous sites in 
duplicated genes to sequence divergence in their introns and flank- 
ing sequences to exclude genes that appear to have undergone 
gene conversion or natural selection for codon usage bias. After 
"cleaning" their database of genes experiencing gene conversion or 
selection at synonymous sites, Gu et al. estimated the gene dupli- 
cation rates in S. cerevisiae, D. melanogaster, and C. elegans to be 
0.028, 0.0014, and 0.024 duplications/gene/million years, respec- 
tively. These results are qualitatively similar to the results of Lynch 
and Conery (2000, 2003). 

More recently, Pan and Zhang (2007) estimated the gene 
duplication rates in mouse and humans, using synonymous site 
divergence as a proxy for the age of duplicated genes as some 
of the previous analyses, and attempting to distinguish between 
tandem duplications by unequal crossing over and retrotrans- 
position. Their estimates of the overall gene duplication rate 
ranged from 0.0005 to 0.00149 and from 0.00123 to 0.00423 dupli- 
cations/gene/million years in humans and mouse, respectively. 
Bensasson etal. (2003) arrived at similar rates as Lynch and Con- 
ery (2000, 2003) based on the number of duplicated mitochondrial 
genes that have been transferred to the nucleus (NUMTs). 

DIRECT GENOME-WIDE ESTIMATES OF THE SPONTANEOUS 
DUPLICATION RATE FROM MA EXPERIMENTS 

Direct empirical analyses of individual loci where gene copy- 
number differences result in a distinct phenotype or genotype have 
provided the highest estimates of the gene duplication and dele- 
tion rates (Anderson and Roth, 1977, 1981; Shapira and Finnerty, 
1986; Lam and Jeffreys, 2007; Watanabe etal, 2009). However, 
per-locus measures of the duplication/deletion rate may not be 
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widely applicable at the genome-wide level. Experimental muta- 
tion accumulation (MA henceforth) lines in the estimation of 
mutation rates and parameters. First, they enable the most accu- 
rate estimation of mutation rates without the purging influence 
of purifying natural selection. Second, in conjunction with mod- 
ern genome-wide techniques of analyses, they serve to directly 
quantify genome-wide mutation rates with minimal bias. The 
underlying principle behind MA experiments is straightforward; 
multiple replicate lines derived from an inbred ancestral stock 
population are allowed to evolve independently of one another 
under conditions of extreme bottlenecking each generation. The 
repeated bottlenecks severely diminish the efficacy of natural selec- 
tion, promoting evolutionary divergence due to the accumulation 
of deleterious mutations by random genetic drift. The vast major- 
ity of MA studies have maintained the organism at a constant 
minimal N e for the purpose of drastically reducing the efficacy of 
selection and enabling the accumulation of the vast majority of 
mutations (Mukai, 1964; Ohnishi, 1977; reviewed in Halligan and 
Keightley, 2009). 

The advancement of molecular technologies such as high- 
throughput genome sequencing and oligonucleotide array com- 
parative genome hybridization (oaCGH henceforth) have enabled 
genome- wide analyses of DNA content of MA lines to generate the 
first empirical measures of the spontaneous gene duplication and 
deletion rate in a handful of model organisms (Table 2). Lynch 
etal. (2008) conducted pulse-field gel electrophoresis (PFGE) and 
oaCGH on eight S. cerevisiae MA lines passaged through 200 bot- 
tleneck generations and estimated the spontaneous duplication 
rate to be 3.4 x 10~ 6 per gene/generation. This spontaneous 
duplication rate in S. cerevisiae is four orders of magnitude greater 
than the spontaneous base-substitution rate of 0.33 x 10~ 9 per 
site/generation in this species. Moreover, this spontaneous dupli- 
cation rate vastly exceeds previous estimates arrived at from 
bioinformatic analyses (Lynch and Conery, 2000; Gao and Innan, 
2004) of the originally sequenced S. cerevisiae genome (Goffeau 
etal., 1996). Additionally, the yeast genome originally sequenced 
by Goffeau etal. (1996) has an extremely low incidence of extant 
paralogs with low synonymous divergence that originated from 
small-scale duplication events (Katju etal, 2009). Of this already 
limited number of paralogs, a substantial number are likely of 
older evolutionary origin given the high incidence of selection 
for codon usage bias in conjunction with ectopic gene conver- 
sion within this species (Gao and Innan, 2004; Lin etal., 2006). 
So where are these new paralogs that are spawned at astound- 
ingly high rates? One hypothesis is that most duplicates have, 
at the minimum, mildly deleterious fitness effects that renders 
them amenable to rapid purging from the genome in a unicellu- 
lar eukaryotic species such as S. cerevisiae with a high N e (Katju 
etal, 2009; Lipinski etal, 2011; Katju, 2012). As such, genome 
sequences of isolates/strains that have been subject to some degree 
of natural selection will invariably underestimate the spontaneous 
rate of duplication. 

Lipinski et al. (201 1) provided the first empirical, genome-wide 
estimates of the spontaneous rate of duplication and deletion in a 
multicellular eukaryote, the nematode C. elegans. As in the preced- 
ing study with S. cerevisiae, long-term MA lines formed the focus of 
this study to ensure unbiased estimates of the spontaneous rates of 



gene duplication with minimal influence of natural selection. Ten 

C. elegans MA lines subjected to single-worm bottlenecks for an 
average of 432 generations were assayed using oaCGH. In total, 14 
duplicated segments that comprised complete and/or partial gene 
duplications were detected and verified independently via quan- 
titative PCR. These duplicated segments encompassed 30 genes, 
giving a spontaneous rate of gene duplication of 3.4 x 10~ 7 per 
gene/generation for partial or complete duplications. If only com- 
plete gene duplicates were considered, the spontaneous rate of gene 
duplication was 1.25 x 10~ 7 per gene/generation. The authors 
argued that this estimate is downwardly biased for two reasons, 
namely (i) the number of adjacent microarray probes signaling 
gene copy-number changes may not be sufficiently dense for the 
detection of duplication events with small duplication spans, and 
(ii) the oaCGH DNA microarrays were restricted to unique probes 
only and duplications of genes in recently duplicated regions, for 
instance by unequal crossing over, may not have been detected. 
Despite the possibility that this rate is an underestimate, it is 
two orders of magnitude greater than the C. elegans spontaneous 
base-substitution rate of ~10~ 9 per site/generation (Denver et al, 
2009). Additionally, this empirical spontaneous duplication rate 
estimate is two orders of magnitude greater than the estimate cal- 
culated from bioinformatic analyses of the frequency distribution 
of extant paralogs of varying evolutionary age (Lynch and Con- 
ery, 2000) in the originally sequenced genome of the N2 laboratory 
strain of C. elegans (C. elegans Sequencing Consortium, 1998). 

More recently, Schrider et al. (2013) sequenced the genomes of 
eight sublines derived from two ancestral lines of a long-term MA 
experiment in D. melanogaster. Despite the use of vastly different 
technologies for the estimation of the spontaneous duplication 
rate in C. elegans (oaCGH) and D. melanogaster (Illumina paired- 
ends sequencing), the duplication rate estimates are surprisingly 
similar. Schrider etal. (2013) generated the following rates for 

D. melanogaster: 3.75 x 10~ 7 per gene/generation for partial or 
complete duplications and 1.25 x 10~ 7 per gene/generation if only 
complete duplications were considered. 

ESTIMATES OF THE DELETION RATE 

The frequency of gene copy-number polymorphisms in genomes 
is determined by a combination of the spontaneous duplica- 
tion/deletion rate and the preservation or elimination of these 
changes by natural selection. Hence, in conjunction with other 
evolutionary forces such as selection and genetic drift, the net dif- 
ference in the spontaneous rates of duplication and deletion has 
important consequences for the evolution of genome size. Further- 
more, duplications and deletions may work in concert with one 
another. For example, aneuploidy and duplications were common 
in a collection of random yeast deletion mutants (Hughes etal, 
2000). The duplicated regions often contained genes that were 
related to the deleted genes suggesting that the duplications were 
compensating for the deletions even though the primary functions 
of the deleted and duplicated genes are not identical (Hughes et al, 
2000). There exists ample evidence that loss-of-function muta- 
tions, for example due to gene deletions, can often be suppressed 
or compensated for by multiple copies, or increased transcrip- 
tion of another gene in the genome (Berg etal., 1988; Bender and 
Pringle, 1989; Trempy and Gottesman, 1989; Ueguchi and Ito, 
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1992; Yamanaka etal., 1994; Serebrijski etal., 1995; Timms and 
Bridges, 1998; Menez etal, 2001; Miller and Raines, 2004; Patrick 
etal, 2007; Patrick and Matsumara, 2008). This phenomenon is 
known as "multicopy suppression" and typically results from side- 
functions of a multicopy gene that go unnoticed when it exists 
as a single copy in the genome (Berg etal., 1988). On the flip 
side, deletion events subsequent to duplications can occur com- 
monly and pervasively at the genome-wide level, leading to the 
"diploidization" of polyploids and the evolution of reproductive 
incompatibilities (Wolfe, 2001; Kashkush etal, 2002; Langkjaer 
etal, 2003; Brunet etal, 2006; Scannell etal., 2006; Albertin and 
Marullo, 2012). Internal deletions of segmental duplications can 
also play a role in the eventual fate of duplications. Experiments 
with selected gene amplifications in Salmonella have revealed that 
large duplications are frequently followed by internal deletions 
that appear to facilitate further amplification, by reducing the 
fitness cost associated with amplification of genes that are not 
under selection for increase in gene dosage (Kugelberg et al., 2006, 
2010). 

The gene deletion frequency in bacteria is generally lower than 
the duplication rate, and ranges from 10~ 4 to 10~ 8 (Starlinger, 
1977). Using a combination of sequential bottlenecking of colonies 
which reduces effective population size and PFGE, experiments 
in Salmonella found the deletion rate to be 0.5 x 10~ 8 (Nilsson 
etal, 2005). This is probably an underestimate because there is 
still selection against deleterious deletions and the PFGE approach 
only detects relatively large deletions (Nilsson et al., 2005). If many 
deletions resulted in the loss of essential genes, they would not 
be represented in this estimate. However, if spontaneous gene 
deletion rates are indeed lower than gene duplication rates in 
bacteria, then what is keeping bacterial genomes lean? One con- 
tributing factor is adaptive gene loss (discussed below). We further 
need to take into consideration that the evolutionary dynamics 
of duplications are different from deletions in that duplications 
are prone to loss through recombination. Hence, the instability 
of segmental duplications relative to deletions likely serves as a 
factor in maintaining streamlined bacterial genomes. Lastly, nat- 
ural selection in large bacterial populations is also expected to be 
more efficient in eliminating slightly deleterious duplications rel- 
ative to multicellular eukaryotes with smaller effective population 
sizes. 

Inverse-PCR methods in humans found that the duplication 
and deletion rates of a -globin were very similar. The frequency 
of deletions in a -globin genes can be common in areas where 
malaria is endemic, and polymorphism for the number of a -globin 
genes is probably maintained by balancing selection involving 
increased resistance to malaria (Flint etal., 1986). The frequen- 
cies of spontaneous a -globin deletions in the sperm of two human 
males were 1.6 x 10~ 5 and 6.8 x 10~ 5 . More recently, simi- 
lar methods were used to determine the duplication and deletion 
rates at four hotspots in human sperm and the deletion rate esti- 
mates ranged from 2.2 x 10~ 5 to 9.5 x 10~ 6 , with all deletion 
rate estimates exceeding the duplication rates by 2.1 to 4.1 fold 
(Turner et al., 2008). The population frequency of CNVs resulting 
in DiGeorge-Velo cardiofacial syndrome, Williams-Beuren syn- 
drome and Smith-Magenis syndrome have been used to estimate 
the spontaneous deletion rate in humans. The estimated rates 



range from 2 x 10~ 5 to 1.25 x 10~ 4 deletions/locus/generation 
(Lupski, 2007). Loss of gene duplication occurs generally at a 
higher rate than the duplication rate. For example, loss of the 
bar duplication in D. melanogaster may occur at a rate as high as 
10~ 3 (Sturtevant, 1925). 

Genome-wide estimates of the spontaneous deletion rates are 
currently available for three species: S. cerevisiae (Lynch etal, 
2008), C. elegans (Lipinski etal., 2011) and D. melanogaster 
(Schrider etal., 2013). The spontaneous deletion rates were 
2.1 x 10~ 6 , 2.2 x 10~ 7 , and 9.37 x 10~ 7 /gene/generation in 
S. cerevisiae, C. elegans, and D. melanogaster, respectively. In 
S. cerevisiae and C. elegans, there appears to be a slight excess 
of duplications relative to deletions when considered on a gene- 
by-gene basis, whereas the deletion rate exceeded the duplication 
rate in the D. melanogaster experiment. However, deletions tend 
to be smaller than duplications and the net change in base pairs 
is positive in all three experiments. That is, nucleotides added by 
duplications exceed those deleted. 

FITNESS EFFECTS OF CNVs 

The scientific literature is replete with descriptions of gene duplica- 
tions that are either beneficial or detrimental to the fitness of their 
carriers. On the beneficial side, some of the most striking exam- 
ples in humans include the copy-number increase of the human 
salivary amylase gene (AMY1) that have enabled adaptation to a 
high-starch diet (Perry etal., 2007) and copy-number increase of 
the CCL3L1 gene that is associated with lowered susceptibility to 
HIV infection (Gonzalez et al., 2005). Interestingly, the domestica- 
tion of dogs by humans too has resulted in a copy- number increase 
in the canid amylase gene, enabling dogs to benefit from a high- 
starch diet that is distinctly human and contrasting from their 
wolf ancestors (Axelsson etal, 2013). Copy-number increases are 
also implicated in adaptation to novel or resource-limited envi- 
ronments in microbial laboratory populations (Sonti and Roth, 
1989; Reams and Neidle, 2003), insecticide resistance (Newcomb 
etal., 2005) or metal tolerance (Maroni etal, 1987) in natural 
insect populations, drug resistance in parasites (Nair etal., 2007), 
increased vertebrate resistance to bacterial pathogens (lackson 
et al., 2007) and as a compensatory response to loss-of- function 
mutations (Berg etal., 1988; Bender and Pringle, 1989; Trempy 
and Gottesman, 1989; Ueguchi and Ito, 1992; Yamanaka etal., 
1994; Serebrijski etal, 1995; Timms and Bridges, 1998; Menez 
et al, 2001; Miller and Raines, 2004; Patrick et al, 2007). 

However, most gene duplications are probably deleterious. The 
detrimental consequences of duplications can come from a vari- 
ety of sources: (i) dosage imbalance between the duplicated genes 
and other genes in the genome that remain in single copy, (ii) 
inappropriate expression of gene duplicates that are under the 
control of a different regulatory system, and (iii) the cost of super- 
fluous expression. From the perspective of the deleterious nature 
of gene duplications, increases in gene copy-number are impli- 
cated in increased susceptibility to a wide range of human diseases 
(Lupski, 1991, 1998; Inoue and Lupski, 2002 and references 
therein; Botstein and Risch, 2003; Sebat et al., 2007). Several addi- 
tional lines of evidence support the notion that gene duplications 
are, on average, deleterious. First, the large discrepancy in empiri- 
cal (from MA experiments) andbioinformatics-based estimates of 
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the gene duplication rate is best explained by selection against new 
duplications (Katju et al., 2009; Lipinski et al, 201 1). Bioinformat- 
ically based methods to determine the duplication rate from the 
age distribution of genes in a sequenced genome assume a constant 
loss rate for duplicate genes. However, if selection against duplicate 
copies in their infancy removes most detrimental gene duplicates 
before they can diverge at the DNA sequence level, the loss rate 
may appear to be constant, and yet result in an underestimate of 
the spontaneous duplication rate. Second, population variation 
in gene copy-number also suggests that duplications are generally 
detrimental. In natural populations of D. melanogaster, the allele 
frequencies of duplications are lower than expected if the duplica- 
tions are neutral (Langley et al., 2012), although not all studies can 
reject the null hypothesis of no fitness consequences of completely 
duplicated genes (Emerson etal., 2008). Third, there is a negative 
correlation between allele frequencies of duplicates and recom- 
bination rates, which is consistent with the notion that greater 
efficacy of natural selection associated with higher recombina- 
tion rates is eradicating duplicates at a greater rate from regions 
of high recombination relative to regions of low recombination 
(Langley etal., 2012). A significant negative association between 
the length of the duplicated segment and gene density with 
allele frequencies in humans and Drosophila (Itsara etal., 2010; 
Langley et al., 20 1 2 ) suggests that duplications encompassing more 
genes are more deleterious than those spanning fewer genes. This 
is expected if dosage imbalance plays a large role in determining 
the fitness cost of duplications. 

Deletions, like duplications, can be either detrimental or adap- 
tive. Examples of adaptive deletions are more limited relative to 
adaptive duplications and it is generally assumed that deletions are, 
on average, more detrimental than duplications. Several genome- 
wide studies of copy-number variation in humans have found 
deletion alleles to occur in lower frequencies than duplication alle- 
les (Conrad etal., 2006; Locke etal, 2006). This is suggestive of 
strong purifying selection weeding out deletions. Furthermore, 
a deficit of genie deletions has been observed in humans (Con- 
rad etal., 2006, 2010; Redon etal., 2006) and D. melanogaster 
(Emerson etal., 2008; Langley etal., 2012), implying that dele- 
tions in coding sequences are more deleterious than dupli- 
cations of these sequences, and therefore more likely to be 
purged by purifying selection. Conrad etal. (2010) com- 
pared the relative frequencies of deletions in two additional 
genomic regions, namely intronic and intergenic. Intergenic 
deletions outnumbered intronic deletions, suggesting stronger 
selection against the latter, given their central role in the 
maintenance of accurate intronic sequence for splicing (Con- 
rad etal., 2010). This might also explain why the frequency 
of spontaneous deletions appears lower than duplications in 
MA experiments in yeast and C. elegans (Lynch etal, 2008; 
Lipinski etal., 2011). Although MA experiments can capture a 
wide range of deleterious mutations, mutations with severe fitness 
consequences are still less likely to be fixed than mutations with 
minor and moderate fitness costs. 

Nonetheless, deletions have played an important role in adapta- 
tion. For example, a recurrent deletion of an enhancer for Pitxl in 
sticklebacks is associated with adaptive pelvic reduction (Chan 
etal., 2010). Adaptive deletions might be more common than 



we assume. In experiments with Salmonella, a surprisingly high 
proportion of deletions resulted in increased growth rate, which 
suggests that many bacterial genes are not necessary, and indeed 
a burden, in a specific laboratory environment (Koskiniemi et al., 
2012). Parallel patterns of gene loss have been seen in bacteria, 
for example, during infection or host adaptation and although 
it is tempting to ascribe these to adaptive gene loss, these pat- 
terns can, in principle, also be explained by relaxation of selection 
on the lost genes (Feng etal., 2011; Rau etal., 2012). However, 
many studies of bacterial genome evolution suggest that gene loss 
is often adaptive. For example, the removal of pseudogenes from 
Salmonella genomes occurs at a faster rate than expected if the 
gene loss is purely neutral (Kuo and Ochman, 2010). The question 
of whether deletions are beneficial or neutral is easiest to address 
in an experimental setting rather than by retrospective analysis. In 
experiments with Methylobacterium, Lee and Marx (2012) found 
that repeated gene loss was adaptive, and the benefit from the dele- 
tions was not due to a shorter genome per se. The frequent and 
parallel patterns of gene loss in bacterial genomes recently inspired 
the Black Queen Hypothesis, which suggests that the evolution of 
dependencies in microbes resulted from selection against genes 
whose products can be acquired from other organisms (Morris 
etal., 2012). 

THE ROLE OF N e IN DICTATING CNV LOSS OR FIXATION 

The loss or fixation of CNVs and their consequences for pop- 
ulation fitness depend upon both (i) the selection coefficients 
(s) associated with individual duplications/deletions, and (ii) the 
effective population size (N e ) for the species. The fate of dupli- 
cations/deletions with selection coefficients much less than the 
reciprocal of the N e [Isl l/2N e for diploids] are expected to be 
dictated entirely by random genetic drift. Conversely, the dynamics 
of duplications/deletions with Isl 3> l/2N e are governed by nat- 
ural selection. Deleterious duplications and deletions with very 
large deleterious effects will be rapidly eradicated from the popu- 
lation and unlikely to reach fixation; those with very small effects 
would be effectively neutral. Although the effect of any muta- 
tion is dependent on the N e , the prevailing opinion is that the 
most detrimental class of mutations influencing long-term popu- 
lation fitness includes mutations with small selection coefficients, 
also referred to as slightly deleterious or nearly neutral mutations 
(Ohta, 1992). Such mutations would be eradicated via purifying 
selection at high N e , but can behave in an "effectively neutral" 
fashion and reach fixation by genetic drift at low N e (Lynch and 
Gabriel, 1990; Lande, 1994). 

Empirical estimates of the spontaneous duplication rate, be 
they locus-specific or genome-wide from MA studies, invariably 
exceed estimates from analyzing the age distribution of gene dupli- 
cates in sequenced genomes. What may explain this discrepancy, 
with empirical estimates exceeding bioinformatically based ones 
by two to four orders of magnitude? We have previously proposed 
that the degree of discrepancy in bioinformatic and empirical esti- 
mates of the gene duplication rate is influenced by differences in 
the efficacy of selection in species due to their varying N e (Katju 
etal, 2009; Lipinski etal, 2011; Katju, 2012). Specifically, slightly 
deleterious CNVs will be efficiently weeded out in species with 
large N e but are more likely to survive the onslaught of purifying 
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selection in species with small N e . Currently, bioinformatic and 
spontaneous empirical estimates of the gene duplication rate are 
only available for three species, S. cerevisiae, D. melanogaster and 
C. elegans with estimated N e of 3.3 x 10 7 , 1.15 x 10 6 and 80,000 
individuals, respectively (Lipinski et al., 201 1; Katju, 2012 and ref- 
erences therein). The empirical estimates of the duplication rate 
exceed the bioinformatic estimates by 36,000-, 660-, and 340-fold 
for S. cerevisiae, D. melanogaster, and C. elegans, respectively. This 
discrepancy correlates positively with the species N e as we have 
previously predicted (Lipinski etal., 2011). A more robust test 
of this hypothesis will require greater sampling of the empirical 
genome-wide duplication rates across more species. 

CONCLUDING REMARKS 

Gene CNVs are of fundamental importance for genetic variation 
in populations, genome evolution and the evolution of genes with 
novel functions. When the first genome-wide estimates of the 
spontaneous duplication rate were bioinformatically determined 
from sequenced genomes, they were reported as being similar 
to the point mutation rates (Lynch and Conery, 2000). These 
rates were hailed as being "astronomical" (Pennisi, 2000). Direct 
empirical estimates of spontaneous duplication rates derived from 
experimental MA lines have been demonstrated to be orders of 
magnitude higher. The discrepancy between the bioinformati- 
cally derived and empirical duplication rates suggests that the vast 
majority of gene duplications are deleterious and rapidly erad- 
icated from genomes before being afforded any opportunity to 
impart a genomic signature of their all too brief existence. This 
discrepancy between bioinformatically and empirically derived 
estimates of the duplication rate also appears to be positively corre- 
lated with the species N e . Prokaryotes and unicellular eukaryotes 
with large N e and greater efficacy of selection are expected to 
rapidly purge even mildly deleterious duplicates. Conversely, in 
organisms with small N e such as many multicellular eukaryotic 
species, genetic drift is expected to play an integral role in the accu- 
mulation of gene duplicates leading to the eventual preservation 
of duplicates following functional divergence. 

The last decade or so has witnessed a revolution in the cata- 
loging of structural variants in species, both at the population- 
and genomic-level. Structural variants, however, present multiple 
challenges in the analysis of their dynamics in populations and 
the evolutionary forces responsible for their ultimate fate in 
genomes. Whereas standard population-genetic theory is well- 
equipped to analyze the frequency of alleles or base substitutions 
in populations, CNVs of particular genes can have breakpoints 
in different locations, and duplicated genes can have additional 
variation with respect to genomic location and transcriptional 
orientation, all of which can differentially influence their func- 
tion. In this review, we have not tackled issues relating to 
the structural complexity of CNVs. Gene duplicates, for exam- 
ple, exhibit varying degrees of structural resemblance to their 
progenitor loci (Katju and Lynch, 2003, 2006; Katju, 2012). 
An advanced understanding of how the structural resemblance 
between paralogs influences their eventual fate (pseudogenization, 
subfunctionalization, or neofunctionalization) must precede and 
is germane to elucidating the full contribution of CNVs to genome 
evolution. 



Although most CNVs appear to be selected against, we need 
more information about their distribution of fitness effects, and 
what particular aspects of their genomic and molecular structure 
underlie these phenotypic fitness costs/gains. Are duplication and 
deletion rates species-specific and if so, do these show a depen- 
dence on the structural features of a genome, say the fraction of 
repetitive sequences within a genome? Furthermore, how do these 
high rates of duplication influence the fate of duplicated genes 
in populations via natural selection or genetic drift. One conse- 
quence of a high duplication rate is that adaptive variation in gene 
dosage can frequently arise by duplications. One of the impor- 
tant questions regarding the evolution of novel genes is how often 
this kind of selection for higher gene dosage results in functional 
divergence, for example, because of adaptive enhancement of sub- 
functions or promiscuous activity. Or is selection for gene dosage 
just a temporary response to ephemeral environmental challenges 
and do duplicates revert back to existence in single-copy form 
when these challenges no longer exist? 
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